{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Theory"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Added more equations to fill up the intermediate steps of the ICA derivation from the lecture note."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Symbols**:\n",
    "\n",
    "* $S$: $n \\times m$, source audio, the one we try to recover with ICA. $n$: number of sources/microphones; $m$: number of time steps recorded.\n",
    "* $A$: $n \\times n$, mixing matrix, mixing sources in a linear fashion.\n",
    "* $X$: $n \\times m$, actual recorded audio after mixing\n",
    "* $W$: $n \\times n$, unmixing matrix, the one we try to learn from data $X$.\n",
    "\n",
    "**Relationships**:\n",
    "\n",
    "\\begin{align*}\n",
    "X &= A S\\\\\n",
    "S & = W X \\\\\n",
    "W &= A^{-1} \\\\\n",
    "\\end{align*}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Convention**: \n",
    "\n",
    "* $w_j^T$ mean the $j$th row of $W$;\n",
    "* $x^{(i)}$ means the $i$th column of $X$, i.e. the $i$th sample;\n",
    "* $s_j^{(i)}$ means the element at the $j$th row and $i$th column (i.e. at time step $(i)$) of $S$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each source $s_j^{(i)}$ is given by a density $p_s$, then\n",
    "\n",
    "$$ \n",
    "p(S) = \\prod_{i=1}^{m} \\prod_{j=1}^{n} p_s(s_j^{(i)}) \n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The density in terms of $X$ is (from linear algebra), which can also be interpreted as likelihood of $W$,\n",
    "\n",
    "$$ \n",
    "p(X) = p(W^{-1}S) = \\prod_{i=1}^{m} \\bigg (\\prod_{j=1}^{n} p_s(w_j^T x^{(i)}) \\cdot \\left|W\\right| \\bigg)  = L(X;W) \n",
    "$$\n",
    "\n",
    "Note that $\\left|W\\right|$ is OUTSIDE the second $\\prod$.\n",
    "\n",
    "**Note**: *I need to understand this step better in the future*."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The log-likelihood is \n",
    "\n",
    "$$\n",
    "\\ell(X; W) = \\sum_{i=1}^m \\bigg( \\sum_{j=1}^{n} \\log p_s(w_j^T x^{(i)}) + \\log \\left|W\\right| \\bigg)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To calculate the density, a reasonable assumption for its CDF is the sigmoid function, e.g. $g(a) = \\frac{1}{1 + e^{-a}}$, so $p_s = g'$. Then\n",
    "\n",
    "$$\n",
    "p_s(w_j^T x^{(i)}) = g'(w_j^T x^{(i)}) \n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Replacing it back into $\\ell(X; W)$,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\ell(X; W) = \\sum_{i=1}^m\\sum_{j=1}^{n} \\bigg(\\log g'(w_j^T x^{(i)}) + \\log \\left|W\\right| \\bigg)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Now derive the gradient of log-likelihood\n",
    "\n",
    "For clarity, set $g'(w_j^T x^{(i)}) = g(1 - g) = g - g^2$, which is a convenient property of the sigmoid function."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For $w_j^Tx^{(i)}$,\n",
    "\n",
    "\\begin{align*}\n",
    "\\nabla_W \\log g(1 - g) \n",
    "&= \\frac{(1 - 2g) g (1- g)}{g(1 - g)} \\nabla_W (w_j^T x^{(i)})  \\\\\n",
    "&= (1 - 2g)\\nabla_W (w_j^T x^{(i)})\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Inspect $\\nabla_W (w_j^T x^{(i)})$ closely, it's a $n\\times n$ matrix with all rows zero but $j$th row equaling to $x^{(i)}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\nabla_W (w_j^T x^{(i)}) = \\begin{bmatrix}\n",
    "0 & \\cdots & 0 \\\\ \n",
    "\\vdots & \\ddots & \\vdots \\\\\n",
    "x_0^{(i)} & \\cdots & x_n^{(i)} \\\\ \n",
    "\\vdots & \\ddots & \\vdots \\\\\n",
    "0 & \\cdots & 0 \\\\ \n",
    "\\end{bmatrix}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Therefore, equivalently,\n",
    "\n",
    "\\begin{align*}\n",
    "\\sum_{j=1}^n (1 - 2g(w_j^T x^{(i)}))\\nabla_W (w_j^T x^{(i)}) = \\begin{bmatrix}\n",
    "1 - 2 g(w_1^T x^{(i)}) \\\\ \n",
    "1 - 2 g(w_2^T x^{(i)}) \\\\ \n",
    "\\vdots \\\\\n",
    "1 - 2 g(w_n^T x^{(i)}) \\\\ \n",
    "\\end{bmatrix} {x^{(i)}}^T\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "On the other hand,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\nabla_W \\log \\left|W\\right| = \\frac{\\left|W\\right| (W^T)^{-1}}{\\left|W\\right|} = (W^T)^{-1}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Combining all pieces, we get the formula for gradient ascent as presented in the lecture note.\n",
    "\n",
    "\\begin{align*}\n",
    "\\nabla_W \\ell(X;W) = \\begin{bmatrix}\n",
    "1 - 2 g(w_1^T x^{(i)}) \\\\ \n",
    "1 - 2 g(w_2^T x^{(i)}) \\\\ \n",
    "\\vdots \\\\\n",
    "1 - 2 g(w_n^T x^{(i)}) \\\\ \n",
    "\\end{bmatrix} {x^{(i)}}^T + (W^T)^{-1}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note**: ICA only applies when the number of sources and that of microphones are the same."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Implementation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Heads up**: matrices in numpy are row matrices, as opposed to the column matrices in the lecture notes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "### Independent Components Analysis\n",
    "###\n",
    "### This program requires a working installation of:\n",
    "###\n",
    "### On Mac:\n",
    "###     1. portaudio: On Mac: brew install portaudio\n",
    "###     2. sounddevice: pip install sounddevice\n",
    "###\n",
    "### On windows:\n",
    "###      pip install pyaudio sounddevice\n",
    "###\n",
    "\n",
    "import sounddevice as sd\n",
    "import numpy as np\n",
    "\n",
    "Fs = 11025\n",
    "\n",
    "def normalize(dat):\n",
    "    return 0.99 * dat / np.max(np.abs(dat))\n",
    "\n",
    "def load_data():\n",
    "    mix = np.loadtxt('mix.dat')\n",
    "    return mix\n",
    "\n",
    "def play(vec):\n",
    "    sd.play(vec, Fs, blocking=True)\n",
    "\n",
    "def sigmoid(x):\n",
    "    \"\"\"\n",
    "    A numerically stable sigmoid function for the input x.\n",
    "    \n",
    "    It calculates positive and negative elements with different equations to \n",
    "    prevent overflow by avoid exponentiation with large positive exponent, \n",
    "    thus achieving numerical stability.\n",
    "    \n",
    "    For negative elements in x, sigmoid uses this equation\n",
    "    \n",
    "    $$sigmoid(x_i) = \\frac{e^{x_i}}{1 + e^{x_i}}$$\n",
    "    \n",
    "    For positive elements, it uses another equation:\n",
    "    \n",
    "    $$sigmoid(x_i) = \\frac{1}{e^{-x_i} + 1}$$\n",
    "    \n",
    "    The two equations are equivalent mathematically. \n",
    "    \n",
    "    x is of shape: B x H\n",
    "    \"\"\"\n",
    "    ### YOUR CODE HERE    \n",
    "    pos_mask = (x >= 0)\n",
    "    neg_mask = (x < 0)\n",
    "\n",
    "    # specify dtype! otherwise, it may all becomes zero, this could have different\n",
    "    # behaviors depending on numpy version\n",
    "    z = np.zeros_like(x, dtype=float)\n",
    "    z[pos_mask] = np.exp(-x[pos_mask])\n",
    "    z[neg_mask] = np.exp(x[neg_mask])\n",
    "\n",
    "    top = np.ones_like(x, dtype=float)\n",
    "    top[neg_mask] = z[neg_mask]\n",
    "    s = top / (1 + z)\n",
    "    ### END YOUR CODE\n",
    "    return s\n",
    "\n",
    "def unmixer(X):\n",
    "    # M: length\n",
    "    # N: number of microphones\n",
    "    M, N = X.shape\n",
    "    W = np.eye(N)\n",
    "    losses = []\n",
    "\n",
    "    anneal = [0.1, 0.1, 0.1, 0.05, 0.05, 0.05, 0.02, 0.02, 0.01, 0.01,\n",
    "              0.005, 0.005, 0.002, 0.002, 0.001, 0.001]\n",
    "    print('Separating tracks ...')\n",
    "    ######## Your code here ##########\n",
    "    for alpha in anneal:\n",
    "        print('working on alpha = {0}'.format(alpha))\n",
    "        for xi in X:\n",
    "            p1 = np.outer(1 - 2 * sigmoid(np.dot(W, xi.T)), xi)\n",
    "            p2 = np.linalg.inv(W.T)\n",
    "            W += alpha * (p1 + p2)\n",
    "    ###################################\n",
    "    return W\n",
    "\n",
    "def unmix(X, W):\n",
    "    ######### Your code here ##########\n",
    "    S = np.dot(X, W.T)\n",
    "    ##################################\n",
    "    return S"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "X = normalize(load_data())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Playing mixed track 0\n",
      "Playing mixed track 1\n",
      "Playing mixed track 2\n",
      "Playing mixed track 3\n",
      "Playing mixed track 4\n"
     ]
    }
   ],
   "source": [
    "for i in range(X.shape[1]):\n",
    "    print('Playing mixed track %d' % i)\n",
    "    play(X[:, i])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Separating tracks ...\n",
      "working on alpha = 0.1\n",
      "working on alpha = 0.1\n",
      "working on alpha = 0.1\n",
      "working on alpha = 0.05\n",
      "working on alpha = 0.05\n",
      "working on alpha = 0.05\n",
      "working on alpha = 0.02\n",
      "working on alpha = 0.02\n",
      "working on alpha = 0.01\n",
      "working on alpha = 0.01\n",
      "working on alpha = 0.005\n",
      "working on alpha = 0.005\n",
      "working on alpha = 0.002\n",
      "working on alpha = 0.002\n",
      "working on alpha = 0.001\n",
      "working on alpha = 0.001\n",
      "CPU times: user 43.6 s, sys: 781 ms, total: 44.4 s\n",
      "Wall time: 44 s\n"
     ]
    }
   ],
   "source": [
    "# Note, W and X are both row matrices\n",
    "%time W = unmixer(X)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "W_expected = np.array([[ 72.15081922,  28.62441682,  25.91040458, -17.2322227 , -21.191357  ],\n",
    "                       [ 13.45886116,  31.94398247,  -4.03003982, -24.0095722 , 11.89906179 ],\n",
    "                       [ 18.89688784,  -7.80435173,  28.71469558,  18.14356811, -21.17474522],\n",
    "                       [ -6.0119837 ,  -4.15743607,  -1.01692289,  13.87321073, -5.26252289 ],\n",
    "                       [ -8.74061186,  22.55821897,   9.61289023,  14.73637074, 45.28841827 ]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "assert ((W - W_expected) < 1e-8).all()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "S = normalize(unmix(X, W))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(5, 5)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "W.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(53442, 5)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(53442, 5)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "S.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Playing separated track 0\n",
      "Playing separated track 1\n",
      "Playing separated track 2\n",
      "Playing separated track 3\n",
      "Playing separated track 4\n"
     ]
    }
   ],
   "source": [
    "# Copied from http://cs229.stanford.edu/ps/ps4/q4/bellsej.m\n",
    "# % now have a listen --- You should have the following five samples:\n",
    "# % * Godfather\n",
    "# % * Southpark\n",
    "# % * Beethoven 5th\n",
    "# % * Austin Powers\n",
    "# % * Matrix (the movie, not the linear algebra construct :-) \n",
    "\n",
    "# I think the Beethoven 5th & Austin Powers are reordered!\n",
    "for i in range(S.shape[1]):\n",
    "    print('Playing separated track %d' % i)\n",
    "    play(S[:, i])"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
